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Abstract. We investigate, both theoretically and numerically, the knotting probability after 
a local strand passage is performed in an unknotted self-avoiding polygon on the simple cubic 
lattice. In the polygons studied, it is assumed that two polygon segments have already been 
brought close together for the purpose of performing a strand passage. This restricts the polygons 
considered to those that contain a specific pattern called at a fixed location; an unknotted 
polygon containing is called a 0-SAP. It is proved that the number of n-edge 0-SAPs grows 
exponentially (with n) at the same rate as the total number of n-edge unknotted self-avoiding 
polygons (those with no prespecificd strand passage structure). Furthermore, it is proved that 
the same holds for subsets of n-edge 0-SAPs that yield a specific after-strand-passage knot- 
type. Thus the probability of a given after-strand-passage knot-type does not grow (or decay) 
exponentially with n. Instead, it is conjectured that these after-strand-passage knot probabilities 
approach, as n goes to infinity, knot-type dependent amplitude ratios lying strictly between and 
1. This conjecture is supported by numerical evidence from Monte Carlo data generated using a 
composite (aka multiple) Markov Chain Monte Carlo BFACF algorithm developed to study 0- 
SAPs. A new maximum likelihood method is used to estimate the critical exponents relevant to 
this conjecture. We also obtain strong numerical evidence that the after-strand-passage knotting 
probability depends on the local structure around the strand passage site. If the local structure 
and the crossing-sign at the strand passage site are considered, then we observe that the more 
"compact" the local structure, the less likely the after-strand-passage polygon is to be knotted. 
This trend for compactness versus knotting probability is consistent with results obtained for 
other strand-passage models, however, we are the first to note the influence of the crossing-sign 
information. We use two measures of "compactness" : one involves the size of a smallest polygon 
that contains the structure and the other is in terms of an "opening" angle. The opening angle 
definition is consistent with one that is measurable from single molecule DNA experiments. 
The theoretical and numerical approaches presented here are more broadly applicable to other 
self-avoiding polygon models. 



1. Introduction 

Experimental evidence indicates that enzymes (type II topoisomerases) act locally in DNA to 
perform a strand passage (two strands of the DNA which are close together pass through one 
another) in order to disentangle the DNA so that normal cellular processes can proceed [1] . Given 
that these enzymes only act locally, the DNA experiments of Rybenkov et al [2] show that type II 
topoisomerases reduce knotting (a global property) in DNA remarkably efficiently (the steady-state 
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fraction of knots was found to be as much as 80 times lower than at equilibrium). Experimentalists 
have not yet completely characterized this topoisomerase-DNA interaction mechanism, and hence 
several models for studying it have been developed. Proposed mechanisms (see [3, 4] for reviews) 
include those that assume that topo II actively bends (the active bending model) or moves along 
the DNA (kinetic proof-reading [5]) before performing a strand passage and those that assume 
that topo II acts preferentially at locations in the DNA that have a specific pre-formed local 
conformation or "juxtaposition" shape, with a preference for a "hook-like" shape. To study the 
proposed interaction mechanisms, various random polygon strand-passage models have been used: 
from worm-like chains to freely-jointed chains to self-avoiding lattice polygons [3,6-13]. While 
the worm-like chain models are the closest to being DNA-like, the simpler lattice models have 
the advantage that the excluded volume property can be easily incorporated and that they are 
amenable to combinatorial and asymptotic analysis. In addition, lattice polygon models (see 
for example [14]) can exhibit similar scaling behaviour with respect to knot localization as that 
observed in DNA knot experiments [15]. 

One point for comparison between the models and experiments, is the knot reduction factor, 
Rk, introduced in [12]: 

1 1 ratio of knots ((f)) to unknots (<f)) at equilibrium (eq) 
P^/P^ ratio of knots ((f>) to unknots ((f)) at steady-state (st) ' 
where Rk > 1 indicates that the ratio of knots to unknots at steady state is smaller than at 
equilibrium and thus knotting has been reduced. For the experiments, the phrase "thermodynamic 
equilibrium" refers to the distribution of knots resulting from random cyclization of a linear 
duplex DNA with cohesive ends, and the phrase "steady-state" refers to the corresponding 
distribution after a topoisomerase-catalyzed reaction (with continuous ATP hydrolysis) has reached 
its steady state. For a random polygon strand-passage model, typically "equilibrium" refers to the 
distribution of knots over all possible random polygon conformations and "steady-state" refers to 
the knot distribution that results when transitions from one polygon conformation to another can 
only occur via a specified local strand-passage mechanism. For the simplest 2-state model where 
a polygon is either unknotted ((f)) or knotted ((f>), the knot reduction factor reduces to [12] 

Pl q h 



r k - -B^n — : ( 2 ) 



where t a ^b is the one-step transition probability for going from state a <G {(f), </>} to state b G {4>, (f)}. 
Thus for a fixed equilibrium distribution, the one-step transition probabilities for a strand-passage 
model determine the knot reduction factor. Given a random polygon model, one goal is to vary 
the strand-passage mechanism in order to determine factors that result in the most knot reduction; 
these are candidate factors for playing a role in the actual topoisomerase-DNA interaction. In this 
paper, as a first step towards this, we do not calculate knot reduction factors but instead focus 
on the theoretical and numerical investigation of the one-step transition knotting probability t^^ 
(and related quantities) for a lattice polygon model of strand passage. 

In 2000 [6], we proposed the first lattice polygon model for studying strand passage. Assuming 
a dilute solution and good solvent conditions, in [6] we consider a ring polymer in which two 
segments of the polymer have already been brought close together for the purpose of performing 
a local strand passage. The conformations of the ring polymer are represented by self-avoiding 
polygons (SAPs) on the simple cubic lattice containing a specific structure 6 (located at the strand 
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passage site - see figure 1 (a) in section 2); such SAPs are referred to as 9-SAPs. Our particular 
choice for the strand-passage structure was motivated initially by its similarity to the Berger 
et al [16] proposed shape for the topoisomerase-DNA complex; it should be noted, however, that 
the precise shape of the topoisomerase-DNA complex is still an open question. In our model, each 
equal-length 6-SAP (with 8 fixed at the origin) is considered to be equally likely as a possible 
polymer configuration. Consequently we do not address how the strand passage site was identified 
and formed nor the effect of different solvent conditions. In the 0-SAP model, a strand passage is 
performed at only if the lattice sites between its two strands are empty (see figure 1 (a)), and 
in this case is called ©o and polygons containing it are called 6o-SAPs. Strand passage is then 
performed by replacing 6o by the structure S as shown in Fig 1 (b) and the result is a lattice 
polygon. 

For this model, we have investigated both numerically and theoretically [6, 7] the polygon- 
length dependence of the knotting probability. Specifically, for the 0-SAP model and a given 
polygon length n, consider the one-step transition knot probability from the unknot to knot-type 
K: 



t n {(j> -> K) 



Pn 



prob. O occurs at 
strand passage site 



prob. strand passage at 
©o results in knot K 



(3) 



.Pn{<t>). 

where p n {4>) is the number of n-edge unknotted SAPs rooted at the origin, is the number of 
these that contain ©o, and p®°(K) is the number of the latter that yield a knot-type K SAP after 
a single strand passage is performed at © . t n {<f> — > <j>) (needed for the denominator of Rk) is then 
given by 1 — t n (<j> —¥ 4>). Alternatively, for the restricted equilibrium in which only ©o-SAPs are 
considered, the ©o -restricted knot probability, p®°(K), is given by the second ratio above, namely: 

pZ°(K) = ?^P-. (4) 

Pn 

In either case, K must either be the unknot or an unknotting number one knot, and we denote the 
set of all such K by JC. In [7], combinatorial bounds are proved which relate the polygon counts 
just defined. These bounds yield that the number of n-edge unknotted 0-SAPs and ©o-SAPs 
each grow exponentially (with n) at the same rate as the total number of n-edge unknotted self- 
avoiding polygons. Thus, for example, lhrin^oc n^ 1 logp„(0) = linin^oo n^ 1 \ogp®° . Furthermore, 
it is proved that the same holds for each subset of n-edge unknotted ©o-SAPs that yields a 
specific after-strand-passage knot-type. Thus, for example, the ©o -restricted knotting probability, 
Pn°(i>) — 1 ~ does not grow exponentially with n. Based on a heuristic argument, 

it is conjectured that the leading asymptotic form (as n goes to infinity) of p n (<fi), P®° an d 
p®°(K) are all the same, up to a positive constant, and hence < lim^^ Pn"{4>) < 1 and 
< linin^oo t n {4> ->• 4>) < 1. 

In [6], a composite Markov chain (CMC) Monte Carlo (also known as multiple Markov chain) 
algorithm was developed for studying 0-SAPs with any given fixed knot-type. This algorithm, 
called the CMC 0-BFACF algorithm, is based on the BFACF algorithm [17-19]. Also in [6], 
an ergodicity proof was given for the ©-BFACF algorithm which was a non-trivial extension of 
the Janse van Rensburg and Whittington [20] ergodicity proof for the BFACF algorithm. Most 
recently, in [7], improved statistical methods are developed for estimating the length-dependence 
of the knot probabilities from CMC Monte Carlo data and then Monte Carlo data is used to 
investigate the conjectures discussed above in relation to equation (4). The approaches developed 
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to study the 0-SAP model in [7] are expected to be broadly applicable to any self-avoiding polygon 
strand passage model. In this paper, we review the theoretical results and numerical/statistical 
methods from [7] and present new results based on additional Monte Carlo data beyond that used 
in [7]. 

Since 2000, there have been a number of different research groups investigating the one-step 
transition knot probabilities (e.g. [3,8-13]) for a variety of (on- and off-lattice) strand-passage 
models. The main advantage of the G-SAP model over the newer models is that it has been 
possible to prove results about it. In contrast, little if anything has been proved about the newer 
models and each model has aspects (e.g. off-lattice polygons or virtual strand passages) which 
make mathematical rigour a challenge. The newer strand passage models, [3,8-13], and their 
studies have, however, raised a number of important questions. The most important of these with 
respect to modelling DNA is: How do the knot probabilities depend on the local juxtaposition 
geometry around the strand-passage site? For example, from the strand-passage model studies of 
[12,13], it is observed that the "tightness" or "compactness" of the local juxtaposition geometry 
affects the knot reduction factor and the knotting probabilities. Specifically, in 2006, Liu et al 
[12] investigated knotting probabilities after a local "virtual" strand passage in SAPs in Z 3 ; the 
strand passage is termed virtual since the after-strand-passage polygon need not be a SAP in Z 3 . 
They investigated knotting (and unknotting) probabilities as a function of the local juxtaposition 
geometry of the two polygon segments involved in the virtual strand passage, and highlighted their 
results for three classes of juxtapositions (from most to least compact): "hooked", "half- hooked" , 
and "free" (cf. [12, table 1]). They found that strand passages about a hooked juxtaposition (when 
compared to the other two types) had the lowest knotting probability (essentially zero) and those 
about a free juxtaposition had the highest. In [13], they obtained similar conclusions for an off- 
lattice model; specifically, from [13, table I], the probabilities of knotting for hooked, half-hooked, 
and free juxtapositions are reported to be, respectively, 0.0028, 0.0077, and 0.1014. 

The structure 9 resembles the half- hooked juxtaposition of [12], but it is not the same. 
Furthermore, for the 0-SAP model, we only consider a strand passage that yields a SAP in Z 3 . 
Thus it is not possible to directly compare the Liu et al results to any from the 9-SAP model. 
However, it is possible to investigate how the knotting probabilities for 6o-SAPs depend on the 
local geometry immediately adjacent to the fixed structure Oo- Based on the Liu et al results, it is 
expected that the knotting probabilities do depend on this local geometry. For the 0-SAP model, 
the geometry-dependent one-step transition knot probability for if£K is given by: 



P 



G 



Pn 



(5) 



Pn{4>) 

where p^ is the number of n-edge unknotted 9o-SAPs that have a specified local juxtaposition 
geometry G and p^{K) is the number of these that yield a knot-type- if SAP after a single strand 
passage is performed at 8o- Alternatively, for the restricted equilibrium in which only 0q-SAPs 
with local geometry G are considered, the G -restricted knot probability, p^(K) 7 is given by the 
second ratio above, namely: 

Pn(K) = (6) 
Pn 

For our numerical investigation, we focus on Pn{4>) = 1 — Pn{4>) an d explore its dependence on 
G. Because the hooked, half-hooked, and free local geometry classifications do not translate to 
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the 0-SAP model, we need alternate schemes to classify the local geometry. In this paper, we 
propose two new compactness classification schemes and show that the corresponding knotting 
probabilities decrease with compactness in each case. In both schemes, we find that the sign of 
the crossing at the strand passage site plays a significant role. This latter observation is consistent 
with experimental work on DNA which indicates that some topoisomerases exhibit a chirality bias 
[21-24]. 

The purpose of this paper is thus two-fold. Our first goal is to summarize the theoretical results 
and conjectures and the numerical methods from [7] regarding the knot probabilities. Using these 
methods, the conjectures are then tested, based on new data beyond that which was available in [7]. 
Our second goal is to investigate, for the first time, the effect of the local juxtaposition geometry on 
the O-SAP model knotting probabilities. To do this, we first extend the combinatorial arguments 
developed in [7] to obtain analogous results and conjectures about the asymptotic properties of 
p%{K). We then use the numerical methods of [7] to investigate numerically the dependence of 
Pn{4>) on n and G. We establish first that it is highly dependent on the crossing-sign at the 
strand passage site. In order to determine which factors are most influential on the knotting 
probability, we investigate two geometric properties of the juxtapositions - an "opening" angle and 
a "compactness" measure. We find a trend for "compactness" versus knotting probability which is 
consistent with results obtained for other strand-passage models [12, 13]. We also find an "opening" 
angle versus knotting probability trend which is noteworthy in light of recent experimental results 
[21]. In particular, our "opening" angle is defined to be consistent with the angle defined in [21]. 
They find that topo IV (a type II topoisomerase) binds preferentially when this angle is slightly 
acute; we find that the more acute the angle, the lower the knotting probability. 

In summary, in the next section of the paper (section 2), the O-SAP model for local strand 
passage in unknotted ring polymers is defined. We then investigate, both theoretically and 
numerically, the distribution of knots obtained after performing a strand passage in an unknotted 
n-edge ©o-SAP. Given K e tC, wc hcuristically argue (in section 2.1) that its probability of resulting 
from a strand passage at ©o depends on n and approaches (as n — > oo) a limit lying strictly between 
and 1. We prove (in section 2.1) that the rate of approach to the limit is less than exponential. 
In order to investigate the knot distribution further as a function of polygon length, the CMC 0- 
BFACF algorithm is used. The ergodicity classes for this algorithm are discussed (see section 3). 
Then, the maximum likelihood estimation approach for analyzing CMC data from [7] is reviewed 
(see section 4.1 and appendix A) and used to provide evidence supporting our heuristic arguments 
(see section 4.1). Then our best estimates of the ©o-restricted knot probabilities are presented (see 
section 4.2). Finally, we present evidence that the probability of going from an unknot to a knot 
for ©o-SAPs does depend on the local structure around the strand-passage site and especially on 
the crossing-sign at the strand-passage site (see section 5). 

2. The unknotted 0-SAP model 

An n-edge self-avoiding polygon (SAP or polygon, for short) is an n-edge connected subgraph on 
the simple cubic lattice Z 3 with each vertex having degree two. For SAPs, the number of edges, n, 
must be greater than 3 and even and this will be assumed henceforth. In our model, we assume that 
two strands of the polymer have already been "pinched" together for the purpose of implementing 
a strand passage. To model the pinched portion of the ring polymer, the SAPs used are required 
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Figure 1. (a) The fixed strand passage structure 0: open and empty circles represent its 
vertices and open bonds represent its edges. Dashed lines and circles containing asterisks 
represent, respectively, lattice edges and vertices that does not occupy but which may be 
occupied in a 0-SAP. In the case that the circles containing asterisks are not allowed to be 
occupied, the strand passage structure is called ©o- A = (1,0,0); B = (0,0,0); C = (—1,0,0); 
D = (0, -1, -2); E = (0, -1, -3); F = (0, 0, -3); G = (0, 1, -3); and H = (0, 1, -2). (b) The 
after-strand- passage structure S : open and empty circles represent its vertices and open bonds 
represent its edges. The circles containing asterisks are vertices in not occupied by S . (c) An 
unknotted 14-edge ©o-SAP ui and (d) the corresponding 18-edge after-strand-passage polygon 
ui s . (e) shows the ©J-SAP ui obtained via the mirror operation ~ from the 0^"-SAP u) of (c). 

between the two strands of is needed to ensure that all 9-SAPs can be generated via a sequence 
of BFACF moves (this is explained further in section 3). 

To perform a strand passage at in a 0-SAP, 8 is replaced by the fixed after- strand-passage 
structure S as illustrated in figure f (b). The vertices in figure f (a) that are represented by 
circles containing asterisks must not be end points of any edge of the initial polygon in order for 
this strand passage to yield a lattice polygon; hence strand passage is only performed at ©o- For 
a Oo-SAP ui, the polygon, uj s , obtained by replacing with S is referred to as the resulting 
after- strand-passage polygon. (Figure f (c) is an illustration of a 14-edge 0o-SAP and Figure 1 (d) 
is an illustration of the after-strand-passage polygon obtained from it.) When uo s has knot-type 
K, then u is called a o (A')-SAP. 
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Note that oj s has four more edges than w; the extra edges ensure that u s is a lattice polygon. 
Thus, strictly speaking, the transition knot probabilities calculated using this model will not 
correspond to those for a realizable steady state of fixed polygon-length lattice SAPs. However, we 
argue that this is no worse than for the lattice strand-passage models of [10, 12] which do not yield 
an after-strand-passage lattice polygon. Furthermore, just as for these other models, we expect the 
O-SAP model will predict qualitative trends that are broadly applicable. In addition, we mitigate 
this problem somewhat by considering averages over polygons of varying lengths when calculating 
knot probabilities (see section 4.2). 

For the remainder of the paper, our focus is on unknotted 0-SAPs only and, unless stated 
otherwise, the term 0- or o -SAP refers to only those that are unknotted. 

The polygon conformation around will be used to investigate juxtaposition-geometry-effects 
on knotting. To do this, we define a ©o-SAP oj's juxtaposition, J, by the vertices (v\, V2, V3, V4) of 
w that arc not in 6 but are, respectively, immediately adjacent to the vertices A, G, D and H of 
9. In this case, to is called a J-SAP and, when oj s has knot-type K, a J(K )-SAP. There are 144 
juxtapositions J that can occur in a 0o-SAP and we denote the set of these juxtapositions by J. 
See figure 2 for examples of J 6 J . Since we will refer to these examples further, we name them 
according to the shape of the top segment of the juxtaposition. That is, we name them respectively 
(from left to right in figure 2) as S (for straight top), L (for L-shapcd top) and Z (for Z-shaped 
top). 

Depending on how the endpoints of are paired and joined to form the polygon to, the 
projection of into the z = plane results in either a positively (+) or negatively (-) signed 
crossing (according to a right-hand-rule). In the former case, outside 0, G is always directly 
connected to H in oj, while in the latter case G is always directly connected to D. For the (+) 
case, is labelled + , lo's juxtaposition is labelled J + and uj is called a + -SAP and a J+-SAP; 
in the (-) case, is 0~, the juxtaposition is J~ and a; is a 0~- and a J~-SAP. Thus each J E J 
has a (+) and (— ) version. For each a e {+,—}, we use J a (K)-SAP to refer to any J CT -SAP 
whose after-strand-passage polygon has knot-type K G 1C. Figure 3 displays examples of signed 
juxtapositions; the arrows indicate how the end points are joined in any polygon containing the 
juxtaposition (by convention, we always orient the top strand of from A to G). 

Depending on the extent to which the local geometry, G, at the strand passage site is specified, 
we can define associated polygon counts. That is, for each G G {0, + , 0~, 0o, 0g , 0q , J, J + , J~ : 
J G J}, Pn an d Pn(K) are defined respectively as the number of n-edge G-SAPs and G(K)-SAPs. 
Define also p^{4>) := p% - p% (0) and p%(4>) := p% - p% (0) . 

In addition, consider the reflection or "mirror" operation "~" which takes unoriented (x, y, z) e 
Z 3 to (—x,y,z) e Z 3 . Figures 1 (c) and (e) illustrate two 14-edge o -SAPs that are related via 
this reflection. In fact ~ provides a one-to-one mapping between the sets of + - and 0~-SAPs 
and, for example, each juxtaposition J~ corresponds to a unique (+) juxtaposition J~-mirror 
given by J~ . (Figure 3 displays examples of juxtaposition pairs related by ~. Note that there are 
only 4 juxtapositions for which J + = J~. ) Furthermore, given any K e JC, for any J~ (K)-SAP 
uj, for example, with after-strand-passage polygon uj s , the after-strand-passage polygon uT s of the 
0(j~-SAP uj (a J~-SAP) has the same knot-type K as that of oj s , except with the opposite chirality 
in the case that K is chiral. Thus (ignoring chirality) we have that 

pt°(K)=p^(K)=pt°(K)/2; (7) 
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Figure 2. Illustrations of example juxtapositions: For the sake of simplicity we refer to these 
juxtapositions, from left to right, as S (straight top), L (L-top), and Z (Z-top). 

and for the signed-juxtaposition polygon counts, 

p J n (K)=p J n + (K)+p J n -(K)=p J n + (K)+pi-(K)=p J n -(K)+pi + (K). (8) 

Thus the juxtaposition-specific after-strand-passage knot probabilities can be determined by- 
focussing on polygons of only one crossing-sign type. We will rely on this fact for our simulation 
results. 
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Figure 3. Illustrations of juxtapositions (a) S + , (b) 5+ (5 + -mirror), (c) Z+, and (d) Z+ 
(Z+-mirror), (e) S~, (f) S~, (g) Z~ , and (h) Z~ respectively. 



Given that there are 144 different juxtapositions, it is useful to group the juxtapositions 
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further. Motivated by the fact that the "hooked" juxtaposition of [12] is more compact than 
their "free" juxtapositions, we use two classification schemes to measure the "compactness" or 
"tightness" of a juxtaposition. The first scheme proposed here is to use the size, denoted 1(G), 
of the smallest 9o-SAP that can contain a specified juxtaposition geometry G to measure its 
compactness. For example, for G given by a single juxtaposition J (or J+ or J~), G is said to be 
a compactness size- to (or m+ or to - ) juxtaposition if 1(G) = to and any 0o-SAP that contains it 
is referred to as an to- (or m + - or to - -) SAP. Note that to G A4 = {14, 16, 18, 20, 22}. Also, any 
G-SAP, G € {m,m + , to - }, whose after-strand-passage polygon has knot-type K £ K is called a 
G(K)-SAP. 

For all but 36 of the unsigned juxtapositions J E J, l(J) is equal to only one of l(J + ) or 
l(J~)- For example, a smallest (size-14) 8 -SAP that contains juxtaposition Z (as illustrated in 
figure 2) must contain Z + , and any O -SAP containing Z~ must have more than 14 edges (in fact 
at least 22 edges). Consequently juxtaposition Z~ is not as "compact" (by our definition) as Z + . 
Hence the crossing sign can play a role in how small (compact) a polygon can be that contains a 
particular juxtaposition. 

As another measure of compactness, we also define an opening angle associated with each 
signed juxtaposition. The angle is defined to be consistent with [21, Fig. 1 A] in the positive 
supercoil case. Given a juxtaposition J + (determined by Vi,v 2 , V3, V4), consider its projection onto 
the z = plane. Under this projection, denote the image of the points A, B,C,D, H, vi,v 2 ,v 3 , and 
V4, respectively to be Pa,Pb, Pc , Pd, Ph , Pi, Pi, P3 and P4. Now, in the z = plane, consider the 
directed line (Z12) from Pi to P2 and the directed line (Z43) from P4 to P3. (Note that the directions 
on these lines are assigned to be consistent with a positively signed crossing.) These two lines 
have at least one point of intersection, choose one such point and label it /. Now form two rays, 
77,2 and 77,4 where rj, 2 starts at I and follows I12 to P 2 and 77,4 starts at I and follows — Z43 to 
P4. Rays 77,2 and 77,4 together define the opening angle for juxtaposition J+ with the initial leg of 
the angle 77,4 and the terminal leg 77,2- (See for example figure 4 with J+ = S + .) Outside 9, v 2 
is joined next to V4 in any polygon containing J + ; the opening angle is thus one measure of how 
"far" these two vertices are apart in such a polygon. Roughly speaking, a larger opening angle 
yields more space between v 2 and v 4 and we view the juxtaposition as being more "open" . In fact, 
as shown in figure 5, there is a correlation between the compactness-size of J + and its opening 
angle. The opening angle for a juxtaposition J~ is obtained by subtracting the opening angle of 
J + from 180°; this ensures that J + and its mirror, J+, have the same opening angle. 

Using this technique, the possible opening angles for the 0o-SAP signed juxtapositions are in 
the set A = { 0°, 18.43°, 26.57°, 45°, 53.13°, 63.43°, 71.57°, 81.87°, 90°, 98.13°, 108.43°, 116.57°, 
126.87°, 135°, 153.43°, 161.57°, and 180°}. For each a G {+, -}, any 6^-SAP with opening angle 
a is called an a CT -SAP, and if its after-strand-passage polygon has knot-type K, it is an a a (K)-SAP. 

To investigate the probability of knotting as a function of polygon length and juxtaposition 
compactness or juxtaposition opening angle, we use the associated polygon counts. That is, for 
G G {m, m + , m~ , a, a + , a" : m G M,a G .4}, p„ and Pn(K) are respectively defined to be the 
number of n-edge G-SAPs and G(AT)-SAPs. First note that all the after-strand-passage polygons 
formed from SAPs counted in or m G {14, 16, 18, 20, 22}, are unknotted. Also note that 
SAPs counted in p\\ either contain juxtaposition Z or its mirror, Z; while those counted in p}| + 
all contain juxtaposition Z + . 
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Figure 4. (a) Juxtaposition S+; the oriented dashed lines L12 and L43 project, respectively, 
to Z12 and (43 in the 2 = plane as shown in (b). (b) The opening angle a associated with 
juxtaposition S+. 




Figure 5. Correlation between the compactness-size m+ and opening angle o+ for the 144 (+) 
juxtapositions. 



One goal is to investigate the asymptotic (as n — > 00) properties of the knot probabilities 
Pn(K) for knot-type K G JC and geometry 67. Towards this end, the asymptotic properties of 
Pn(K)'s numerator and denominator polygon counts are explored first. In the next section we 
prove that both these terms grow exponentially at the same rate. We then make conjectures 
(based on heuristic arguments) about the asymptotic properties of p^(K). 
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2.1. Asymptotic properties of p^{K) 

For the set of all SAPs in Z 3 , Sumners and Whittington (1989) [25] proved that the following limit 
exists: 

lim n^ 1 \ogp n ((f>) = lim n -1 \ogu n ((p) = n a (9) 

n— >oo n— 7-oc 

and satisfies 

Ko < « := lim n _1 logu„, (10) 

n— t-oo 

where u n is the total number (up-to-translation) of n-edge SAPs in the simple cubic lattice and 
u n ((j)) is the number (up-to-translation) of these that are unknotted. The following estimates for 
k are available k = 1.544148 ± 0.000034 [26] (via Monte Carlo) and 1.544162 ± 0.000219 [27] (via 
exact enumeration) [27]. For ko, a recent estimate for the difference k — is (4.15 ± 0.32) x 10 -6 
[28]. The next order behaviour for these polygon counts is not known rigorously but it is widely 
believed [29], backed up by numerical evidence [27,29-31], that there exist real numbers A and 
ao such that, as n — > oo: 

p n (ct>)=A n a °e K ° n (l + o(l)). (11) 

To explore the asymptotic properties of p®°(K), we focus first on p®, p®°(K), and p®° , and 
establish relationships between them and p n {4>). First note that every n-edge 0-SAP is an n-edge 
unknotted SAP rooted at the origin and therefore 

pt°(K) < pf> < P t < Pn{4>) = nu n {4>). (12) 

Next we show that, given any knot-type K £ fC, there exists an integer vtik such that for any 
n > rriK, u n - mK ((f))/2 < p®°(K). To do this, given any K £ JC and any sufficiently large integer n, 
we present a method for constructing an element counted in p®°(K) (an n-edge 0o(X)-SAP) from 
an unknotted SAP. The construction will involve two steps. The first step is the construction of 
a specific Oo(AT)-SAP, wk, with length thk- Then, for the second step, given any (n — mx)-edge 
unknotted SAP co, a o (A')-SAP of length n is constructed by "concatenating" w to ljk- The 
latter step involves defining a way to concatenate a SAP to a 0o-SAP so that the result is still 
a 60-SAP. (Sec figure 6 for a schematic description of this argument for the case that K — 10i.) 
The full details of both steps are given next. Note that this argument was first presented in [7] but 
we expand on the details here in order to illustrate that the argument is more widely applicable. 

For the first step of the construction, in the case that K — <\> (the unknot), figure 1 (c) shows 
a 14-cdge Oo(0)-SAP. Thus we can take ut^ to be the polygon in figure 1 (c) and set = 14, the 
number of edges in uj^. For any other K, i.e. K £ K\ {<f\, since K has unknotting number one, by 
definition [32], there exists a knot diagram (ie a signed projection into R 2 ) of it and a crossing, X, 
in the diagram such that: when the sign assigned to X is changed, the result is the knot diagram 
D of an unknot. For a fixed such diagram D and the corresponding crossing X , ujk is formed as 
follows. First, deform and subdivide D so that it gives a signed embedding, D', in Z 2 such that 
signs arc assigned to vertices of degree 4 and the signed vertices are in one-to-one correspondence 
with the signed crossings in D. Respecting the signs of the vertices in D', add vertical (and, as 
needed, planar) edges and then translate and rotate to produce from D' an unknotted o -SAP 
where the vertices B and F of correspond to the crossing X in D'. The result will be a ©o(AT)- 
SAP and hence such o -SAPs exist. Let ojk be a ©o(AT)-SAP with the least possible number of 
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Figure 6. On the left, a schematic depiction of the concatenation of an arbitrary unknotted 
SAP to a 0o-SAP; on the right, the resulting knot-type 10i after-strand-passage polygon. 

edges and let tuk be the number of edges in ujk- (Note that in this construction the crossing sign 
of 9 is fixed to be consistent with the crossing X; hence, depending on the chirality of K, the 
same argument can be applied to construct either a 8 + - or a - -SAP or a J + - or J~-SAP.) 

For the second step, we recall that the standard (for precise details cf. [33, section 1.2.1] or [7, 
algorithm 2.2.2]) procedure for concatenation of polygon u>2 to oj\ in 1? involves: translating W2 so 
that its bottom- most edge, e2, is one unit in the positive x direction from the top-most edge, e\, 
of wi; then (if necessary) rotating uj2 around the x-axis so e2 becomes parallel to e\\ and finally 
deleting e\ and ei and then joining the two polygons by adding in two new parallel edges in the 
positive x direction. It is straightforward to show from the definitions (cf. [7, corollary 2.2.3]) that 
the standard concatenation of any unknotted SAP ll>2 to any 9-SAP u>± yields a 8-SAP. Thus, for 
any even n > mx + 4, concatenating any n — mx edge unknotted SAP ui to luk yields an n-edge 
0o(-K")-SAP. Note that since there are only two possible choices for the initial orientation of e-i 
relative to e±, ie either they are parallel or perpendicular, then at most two different choices of W2 
could lead, via this concatenation procedure, to the same concatenated polygon. 

Since K G /C was chosen arbitrarily and W2 was an arbitrary (n — mi<-)-edge unknotted SAP 
in the above construction, we conclude: for each K S IC, there exists an integer mx > 14 such 
that for any even n > rriK + 4 

\u n - mK {4>) < Pn°(K) < Pn" < Pn < Pn(<t>), (13) 

where the factor of 1/2 accounts for the possibility that two different w 2 yield identical concatenated 
polygons. Hence 

k = lim n~ l \ogp®°(K) = lim n -1 logp®°= lim n _1 logp®. (14) 

As noted above, the construction leading to equation (13) will apply also to any juxtaposition 
J + or J - , J £ J, and K 6 K, that can result from performing strand passage at such a 
juxtaposition. Thus, given any geometric specification G which is defined in terms of either a 



Knotting probabilities after a local strand passage in unknotted self-avoiding polygons 



13 



single signed juxtaposition or any group of signed juxtapositions, we have also that: for each 
K G JC, there exists an integer > 14 such that for any even n > m K + 4 

\u n - m ^) <Pn(K) <pt° <Pt <Pn(# (15) 

Hence, for example, 

k = lim n^ 1 logp^(K) = lim n~ 1 logp®° = lim n^logp®. (16) 

n— >oo n— >oo " n— s-oo 

One direct consequence of this result with respect to the asymptotic properties of the knot 
probabilities is that p®°(K) and p^{K) do not grow (or decay) exponentially with n, that is: 

lim n- 1 log p?°(K) = lim n^ 1 log p°{K) = 0. (17) 

n^oo n— 7-oc 

Another consequence is that, as n — > oo, each of p n {4>), P®> Pn°, Pn°{K)-> Pn anc ^ Pn(K) 
can be written in the form e K °™+°("). it is expected that, just like for p n ((f>), the more detailed 
asymptotic behaviour of each of these quantities has a form similar to that given in equation (11) 
but where the amplitude A and critical exponent ceo of equation (11) may be quantity dependent. 
For convenience, we use the notation A* and a* to denote, respectively, the amplitude and critical 
exponent corresponding to the polygon counts for n-edge *-SAPs for each * G {G, G(K) : G G 
g,K G /C} with g = {e,e ,e CT ,e^, J, J CT ,m,m CT ,a,a ,T : J G J, a G {+,-}, m G M, a G .4}. 
The existence of the limits that would define these amplitudes or critical exponents has not been 
proved for any of these quantities. Instead, we next give a heuristic argument that leads us to 
make conjectures about relationships between the critical exponents and about the asymptotic 
behaviour of p®"(K). These conjectures are then investigated numerically in sections 4.1 and 4.2. 

Consider the 6o-SAP shown in figure 1 (c). Removing any one of the polygon edges which 
is not part of 9 yields a "pattern", P^, which can occur as a subwalk of some unknotted polygon 
more than once and indeed arbitrarily often. Similarly such a pattern Pk can be obtained from 
ujk for each K G JC \ {</>}. Consistent with what is known for all polygons [25], it is believed 
(although not proved) that, given one of these patterns Pk, there exists an e > such that all but 
exponentially few sufficiently large n-edge unknotted SAPs contain Pk as a subwalk at least en 
times. If this were true, then almost all large enough unknotted polygons contain several copies 
of Pk and hence several copies of translated versions of 0o- Such an unknotted polygon can then 
be translated so that any one of the en Pr-'s contains O and the resulting polygon is a distinct 
Oo(-ftT)-SAP. This leads us to the following conjecture. 

Conjecture 1. Given any K G JC, there exists ck > and integer Nk > such that for all 
n > N K : 

eKnu n (cp)<pt a (K). (18) 

If the critical exponents exist and if conjecture 1 is true then the following is a direct 
consequence. 

Conjecture 2. a@ = ae = a@ (x) = «o for all K G JC. 

Furthermore if the asymptotic form of equation (11) applies generally then the following 
conjectures are a further consequence. 

Conjecture 3. Given any K G JC, 

< p^(K) := lim p*°{K) = < 1. (19) 
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Conjecture 4. Given any knot-type if et and any G G Q, 

< p G {K) := lim p°{K) = ^1 < 1. (20) 

Equation (17) and conjectures 2 and 3 will be investigated numerically, based on Monte Carlo 
data, in sections 4.1 and 4.2 respectively. Conjecture 4 will be explored in section 5 with G 
ranging from specific juxtapositions to groupings such as the compactness-size grouping or the 
opening angle grouping. The details of the Monte Carlo simulations used to generate the data for 
these studies is presented next. 



3. Simulation Specifics 

To study 9-SAPs, Szafron [6] developed the 0-BFACF algorithm by modifying the BFACF 
algorithm [17-19] to only generate O-SAPs. Based on the arguments of Jansc van Rcnsburg 
and Whittington [20], Szafron [6] proved that the 0-BFACF algorithm preserves the knot-type of 
the initial polygon. This non-trivial proof relies on the fact that the two strands of the structure 
have enough lattice space separating them to allow another polygon strand to pass through; this 
guarantees that Reidemeister III moves are possible using a sequence of BFACF moves. In the 
case that the initial polygon is unknotted, Szafron proved that the 0-BFACF algorithm has two 
ergodicity classes which correspond precisely to the + — and 0~-SAPs defined in the previous 
section. However, as seen from equations 7-8, all the quantities of interest can be investigated using, 
for example, 0~-SAPs only, and we focus on these. Thus, from a run of the 0-BFACF algorithm, a 
sample of 0~-SAPs is obtained. Estimates of the one-step transition knot probabilities of interest 
can be then obtained from such a sample by performing a single strand passage on each sample 
polygon and recording the knot-types for each of the resulting after-strand-passage polygons. 

The 0-BFACF algorithm generates a Markov Chain {X t ,t = 0, ..,T} such that at each time 
t, X t is a 0-SAP in the same ergodicity class (crossing-sign-class) as Xq. Here Xq is an unknotted 
0~-SAP and we define the set of these to be The standard three possible BFACF moves are 
used to go from X t to X t+1 , however, the probability distribution according to which a move is 
attempted is modified from that of the BFACF algorithm to accommodate for the reduced state 
space. 

For a fixed integer q satisfying z^ := e~ 2K ° < [|] 9 and a fixed real-valued z such that < 
z < z<f,, the 0-BFACF algorithm one-step transition probabilities P uu > = P(X t +i = u)'\X t = u>), 
for all cj,u/ G 3?, are chosen so that the equilibrium probability distribution, {7r u (g, z),oj G 
of the Markov Chain is given by 

"<""-£^W' , ' u " 6 '- (21) 

Because the 0-BFACF algorithm is based on the BFACF algorithm, it suffers from the 
same major disadvantage, that is, as z — > z^, the exponential autocorrelation time for the 
algorithm approaches infinity [34]. Hence, to reduce the exponential autocorrelation time, a 
composite Markov chain (CMC) implementation of the 0-BFACF algorithm is used. It is similar 
to the multiple Markov chain BFACF algorithm introduced in [30,31] except now the equilibrium 
distribution for a single chain is given by equation (21). 
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Given any integer M > 1 and a real vector z := (z\, z 2 , zm) such that < z\ < z 2 < ... < 
z m < z tjn Szafron [6] proved that the CMC 0-BFACF algorithm is ergodic on [3^] M and has the 
unique stationary distribution given by 

{ir u {q,z),u€3> M }, (22) 

where for oj = (lj\,uj2, • wm) £ , 

M 

n u (q,z) :=Y[n Ui (q,Zi), (23) 

i=l 

with 7r Wi (g, ^j), the distribution of the zth chain, as in equation (21). 

The simulation of the CMC 0-BFACF algorithm used for this work consisted of ten 
independent replications. For the results in [7] and section 4.1, each replication was run for a 
total of 1.8 x 10 11 time steps (1.5 x 10 11 6-BFACF moves in parallel and 0.3 x 10 11 attempted 
swaps) where every sequence of five 0-BFACF moves in parallel was followed by an attempted 
swap between a randomly selected chain (call it chain i) and chain i + While for the results in 
sections 4.2 and 5, each replication was extended to 250 billion 0-BFACF moves in parallel. For 
the distribution given by equation (21), q is set to 2. For each individual replication, the number 
of chains and the distribution of the z,'s over the interval [0.2030,0.2132] is: M = 14, z x = 0.2030, 
z 2 = 0.2050, z 3 = 0.2070, z 4 = 0.2090, z 5 = 0.2100, z 6 = 0.2105, z 7 = 0.2110, z s = 0.2115, 
z 9 = 0.2120, z w = 0.2124, z n = 0.2128, z 12 = 0.2130, z 13 = 0.2131, and z 14 = 0.2132. These 
values of z are valid for the 0-BFACF algorithm because for i = 1, . . . , 14, 

z,< Z(t> < 0.2135 

[6,35]. One motivation for using this distribution of z-values and M = 14 is that these choices 
have been well studied for unknotted SAPs using the BFACF algorithm (cf. [29, 30]). 

The amount of time required for the entire process to equilibrate (r CX p) was estimated using 
Gclman and Rubin's Estimated Potential Scale Reduction technique [36,37]. Applying this 
technique, we estimate f cxp = 5.0 billion 0-BFACF moves in parallel (i.e. after 5.0 billion 0- 
BFACF moves in parallel, the estimated between-the-replication and within-a-replication variances 
have converged to within 2.5% of the same value). 

In order to estimate the amount of "essentially independent" data collected during each 
replication, Fishman's Block Analysis (cf. [38]) technique was used. This technique yielded 
Tint — 0.7 billion 0-BFACF moves in parallel and hence we conclude that states that are 1.4 
billion 0-BFACF moves in parallel apart (ie 2n n t apart) are essentially independent and data that 
is subdivided into blocks of 1.4 million consecutive data points form essentially independent blocks 
of data. Thus the results presented in section 4.1 are based on 1070 essentially independent blocks 
and those in sections 4.2 and 5, on 1785 essentially independent blocks. Sokal [34] has argued 
that, if T exp is less than 5% of the total length of the replication, then the bias introduced into 
the estimates by not discarding the first r exp data points will be much smaller than the actual 
statistical error. Based on this argument, no data was discarded for any of our estimates (40 
blocks is less than 5% of 1070 < 1785 blocks). 
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Table 1. Our CMC MLEs for «o- The values in parentheses arc the estimated 95% margins of 
error. 





Parameter Estimated 


Property * 


N* ■ 

mm 


N* 

max 


k (95% ME) 


e 


98 


3300 


1.544148 (0.000012) 




86 


3300 


1.544147 (0.000013) 


e (3i) 


162 


2000 


1.544135 (0.000023) 



4. Maximum Likelihood Estimates and Limiting Knot Probability Results 

We have developed [7] two methods for statistical analysis of the CMC 9-BFACF Monte Carlo 
data. The first is a method for obtaining maximum likelihood estimates for the growth constants 
and critical exponents of the polygon counts involved in the knotting probability calculation. The 
second is a method for investigating the length (n) dependence of the knotting probabilities using 
a correlated sequence of polygon data from a CMC run; the effect of the correlation is reduced by 
grouping polygons having lengths within a given range to obtain what we call grouped-n estimates. 
Next, in section 4.1 and appendix A, we summarize the maximum likelihood method and present 
results related to equation (17) and conjecture 2. In section 4.2, we review the grouped-n estimate 
approach and present results related to conjecture 3 using data beyond that of [7]. 

4-1. Maximum Likelihood Estimates from CMC Q-BFACF Data 

In [39], a method (referred to here as the Berretti-Sokal MLE Method) was proposed for obtaining 
maximum likelihood estimates (MLEs) for k and 7 (where k and 7 are exponents in the asymptotic 
form for the number, c„, of n-step self-avoiding walks (SAWs) starting at the origin, that is 
c n ~ yle K "n 7_1 ) from a Markov Chain Monte Carlo simulation consisting of several independent 
sample paths. In [7], the Berretti-Sokal MLE method was modified for the case that the Markov 
Chain data comes from a CMC Monte Carlo sample path. For completeness, we summarize the new 
MLE method of [7] in appendix A and review in this section the results regarding the asymptotic 
properties of the 0-SAP counts. 

The main purpose of the MLE method is to investigate conjecture 2 and also to obtain an 
estimate of kq (see equation (14)). To explore conjecture 2, statistical estimates of a*, * G Q, are 
needed. To do this, for a given choice of *, a log-likelihood function is obtained based on the CMC 
Monte Carlo data generated. Maximizing the log-likelihood with respect to ko and a* results in 
maximum likelihood estimates for these parameters. The relevant log-likelihood function is defined 
in appendix A. 

Using this method yields the following estimates for kq (cf. table 1) and a* where 
* £ {0o, Oo (</>), ©o(3i)} (cf. table 2). The estimates for n in table 1 are all equal to four 
decimal places and are equal (after rounding) to four decimal places to a previous direct estimate: 
Ko = 1.544067 ± 0.000811 [29]. Thus our estimates for kq numerically support the proven result, 
equation (14), that and p®°(K) grow at the same exponential rate as p n (4>) as n — > 00. 

Because it provides our largest data sample, we use all the sampled 0~-SAPs in the MLE 
analysis to determine our best estimates for kq and a@. Our resulting best estimates for kq and 



Knotting probabilities after a local strand passage in unknotted self-avoiding polygons 



17 



Table 2. The best CMC MLEs for ae > a e> (4>) an d Q e (3i)- The values in parentheses are 
the estimated 95% margins of error. 





Parameter Estimated 


Property * 


N* ■ 

mm 


TV* 

max 


a* (95% ME) 


e 


98 


3300 


-1.7804 (0.0304) 


e {4>) 


86 


3300 


-1.7793 (0.0229) 


e (3i) 


162 


2000 


-1.9498 (0.2938) 



ae are: 

k = 1.544148 ± 0.000014 (±0.00005) (24) 

and 

a e - -1-78 ±0.02 (±0.02), (25) 
given in the form 

parameter = point estimate ± 95% ME (±systematic error). (26) 

With respect to the systematic error term, we report an error associated with the uncertainty in 
the choice of N m i n (defined in the appendix) which is obtained by taking the largest difference 
(over a range of iV m ; n choices) between the resulting estimates and our reported point estimate. 

Another estimate for kq is 1.544158, which is based on the estimate for k « 1.544162±0. 000219 
(Clisby et al. [27]) and the estimate for the difference n — n Ki (4.15 ± 0.32) x 10~ 6 (Janse van 
Rcnsburg [28]). Our best estimate for k is consistent with this. Our best estimate for ae is used 
next to explore the validity of conjecture 2. 

From table 2, the estimates for ae, ae an d ae (<#>) are equal when rounded to two decimal 
places; this supports that ae = cx@ = ct& a (<p) as in conjecture 2. The computed 95% confidence 
interval for ae (3!) completely contains the computed 95% confidence intervals for ae and ae (</>); 
this does not contradict the conjecture (part of conjecture 2) that ae (3 1 ) = «e = a eo (</,)• 

To explore the last part of conjecture 2, that all the a's are equal to a , we use our best 
estimate for the critical exponent ae (assumed to be equal to ae = a e (K) (f° r any K S /C)). 
In order to compare this to ao, note that Orlandini et al [29] estimated ao — 1 ~ —2.77. Using 
this value for ao, gives ao = —1.77. Since this value is contained in our estimated 95% confidence 
interval for ae given by equation (25), the final part of conjecture 2 is supported numerically. 

4-2. Estimating the Limiting Knot Probabilities 

In this section, we explore conjecture 3. The quantities p e ° and p e °(K), for each K £ IC, will 
generally be referred to as limiting probabilities. As discussed in section 2.1, the existence of these 
limiting probabilities is an open question. Conjecture 3 postulates that, not only do these limits 
exist, but that they are never zero or one. 

To obtain an indication of the order of magnitude of these limiting probabilities, the frequencies 
of ©g(isT)-SAPs observed in our CMC data are summarized in table 3. (Note that the "Other" 
category in table 3 contains the observed number of Oq (-ftT)-SAPs over all K £ K, with crossing 
number greater than five.) Thus, roughly, we expect the limiting probabilities for the: unknot to 
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Property * 


Frequency 


©o 


2491776147 




2459748925 


©o(3i) 


31161421 


©o(4i) 


828162 


©o(5 2 ) 


36596 


Other 


1029 



Table 3. The number of polygons observed across the ten replications and the fourteen chains 
that have property *. 



be close to 1; trefoil to be of the order 10 -2 ; figure eight to be of the order 10~ 4 ; knot-type 52 
to be of the order 10 -5 ; and, at least a six-crossing knot to be of the order 10~ 7 . This ranking 
and the orders of magnitude for the <p ^ K knot probabilities is comparable to that obtained for 
other strand-passage models. For example, in [10, column 1 of Table 2] for K = </>, 3i,4i,52 the 
probabilities (for lattice polygons with mean length 100) are 0.852, 0.061, 0.022, 0.0016 and in [8, 
column 1 of Table 1] (for freely jointed isolateral polygons of length 33) they are 0.9457, 0.0227, 
0.0073, 00006. We expect these probabilities to be both length and model dependent and hence 
we do not make more direct comparisons between the 6-SAP model and these other models. Note 
that for each K G JC with K being chiral, the after-strand-passage polygons we observed were all 
in the same chirality class, that is, for example, we only observe 3^ after-strand-passage trefoils 
and 5 J after-strand-passage five crossing knots. 

Although the frequencies presented in table 3 can be used to estimate the approximate order 
of magnitude of p e °(*), they cannot be used to directly estimate p e °(*), K e JC. Instead, in 
order to study conjectures 3 and 4, we suppose that the polygon counts for n-edge *-SAPs, for 
* G {G,G(K) : G G G,JC G JC}, have the asymptotic form 



A*n a 'e Kan (l + + Oin- 1 ))^ , 



(27) 



for some constants (independent of n) B* and A* > 0. From this, it can be shown that, for 
G(if)-SAPs where G G G, as n — »■ oo, there exists other constants B G {K) and A G (K) > such 
that 

p G (K) » p G (K) + B G (K)n- AG ( K l (28) 

In general, the variability in any estimates of p G {*) increases with n. In order to reduce this 
variability towards obtaining estimates for the limit p G (K), we use "grouped-n" polygon counts. 
Specifically, for positive even integers ni > ni, we focus on polygons whose lengths are in the 
interval [ni,ri2) and define the [m, ^-grouped probability (or grouped probability, for short) 



E 



,(*) == 



M 



xP (K) V m( " )e 

1=1 



M 

q v^ UJ(n)e 



',= l 



where each sum is taken through even values of n, w(n) = (n — 6)n' 
normalizing sum in the denominator of ir UJ (q, Zi) from equation (21). 



(29) 



and Q(j3i) is the 
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K 


p^(K 


p- value 





0.97774 ± 0.00105 (0.005) 


0.42 


3i 


0.02163 ± 0.00055 (0.003) 


0.39 


4i 


0.00089 ± 0.00018 (0.0005) 


0.17 



Table 4. Estimates for the limiting probabilities p e ° (K) determined by fitting an equation 
of the form f(x) = b + mx r to the estimated grouped probabilities displayed in figure 7. The 
estimates are of the form: point estimate ± 95% margin of error (systematic error). The p- value 
presented is the p-value associated with a x 2 -Test for Goodness of Fit between the grouped 
probabilities estimated from the data and the grouped probabilities predicted using the fitted 
equation. 

For any given G e Q and K £ /C, substituting the scaling form (27) into equation (29), results 
in, to first order, that there exists N m - m > such that 

pt, n2 (K) « /£(m) := P G {K) + m G (K)nf {K \ (30) 

for all m > N min , for some constants m G (K) and X G (K) < 0, and with p G (K) as in conjecture 
4. Thus, if the limit in conjecture 4 exists, the grouped probabilities have the same limit as the 
non-grouped probabilities. 

For G G Q, over the interval in which we have reliable data, we determine the non-overlapping 
intervals [ni,U2) in such a manner that the interval lengths d G {K) = \n2 + 2 — n\\ are all 
constant and that the estimates for ^2™l n ^ p G (K) are essentially independent of the estimates for 

S"=ni 2 +d G (*0 A procedure for estimating d G (K) is given in [7]. Using this procedure, 

we estimated that d e °((f)) = 100, d e °(3i) = 140, and d 0o (4i) = 160. For these choices of 
d G (K), the corresponding ratio estimates for the grouped probabilities from our Monte Carlo 
data are displayed in figure 7, that is figure 7 displays our estimated values for /3® 1 „ 2 (</>) (for 
m 6 {14, 114, 214,..., 1914}), p®° n2 (3i) (for n x e {24, 164, 304, 1844}), and p®% 2 (4i) (for 
n\ E {30, 190, 350, 830}) versus n\. This figure also displays each of our fitted equations /^(ni) 
(from (30)) for p®% 2 (</>), p®°,„ 2 (3i), and p®% 2 (4i) versus m. 

We focus on the sets of observed ©o(</>)-, 0o(3i)-, and 9o(4i)-SAPs because for these SAPs 
our fitted equation provides a "good fit" to the grouped probability estimates (for sufficiently large 
n-i > Amin) over the range of reliable data. By "good fit" , we mean that the p- value (cf table 4) 
associated with a x 2 -Test for Goodness of Fit is larger than 0.05. Table 4 contains our estimates 
(from the fit) for the limiting probabilities p e ° (</>), p e °(3i), and p e °(4i). The estimates have the 
form 

point estimate ± 95% margin of error (systematic error). 

The systematic error is estimated by taking the maximum difference between the grouped 
probability point estimates over the region [14, 2012] and the corresponding estimated limiting 
probability; this is a measure for the error resulting from the uncertainty in the choice of A^ m ; n . 

We also analyzed our data for the knot-types with five or more crossings, however, the resulting 
fits resulted in p-values< 0.01; hence the corresponding estimated limiting knot probabilities 
were deemed unreliable. It should be noted, however, that these estimates did not contradict 
conjecture 3. 

With respect to each of the limiting probabilities p e ° (</>), /9 e °(3i), and j o e °(4i), the 
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Figure 7. The grouped probability estimates (x) for n2 (0) (for rai G {14, 114, 214, 1914}) 
and the fitted curve ( ) /?°(ni). The grouped probability estimates (□) for pn 1 °,n 2 (3i) (for 



ni e {24, 164,304, ...,1844}) and the fitted curve (- 



-) ^ "(m). The grouped probability 



estimates (o) for p„°, „ 2 (4i \s) (for ni G {30, 190, 350, ...,830}) and the fitted curve (- 
/®°(ni). The error bars are the estimated 95% confidence intervals. 



corresponding 95% confidence interval lies completely within the interval (0,1). Hence we have 
strong evidence that conjecture 3 holds. 



5. Dependence of the After-Strand-Passage Knotting Probabilities on the Local 
Juxtaposition 

In this section we explore how the knotting probabilities depend on the local geometry about 6 
and how they depend on the scheme used to classify the compactness of these local geometries. 
We begin by exploring the effect that a small change in the local geometry has on the knotting 
probability. Note that, to reduce statistical error, all estimates presented in this section are based 
on grouped probabilities for grouped polygon-lengths n — ri\ to 712 = n\ + 118. Also, in this 
section, given a specific geometry G and polygon length n, the phrase "knotting probability" refers 
to the G- restricted knotting probability, Pn{4>)- 

To determine the influence that the local geometry can have on the knotting probabilities we 
focus on the three juxtapositions, S, L, and Z, illustrated in figure 2 because: (1) the estimated 
knotting probabilities for Z~ and Z + are, respectively, the highest and lowest amongst all the 
juxtapositions, cf. figure 8; (2) in our CMC sample, the number of 9^-SAPs that contain S 
(17237) is roughly equal to the number of 0q -SAPs that contain Z (15155); (3) S resembles the 
half-hooked juxtaposition of [12]; and (4) L differs from both S and Z by only one edge. 
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J 


P J ~{4>) 


p j+ (4>) 


P J {4>) 


z 


0.629 ±0.066 


0.0013 ±0.0002 


0.0287 ±0.0017 


L 


0.282 ±0.036 


0.0007 ±0.0005 


0.0223 ± 0.0052 


S 


0.077 ±0.012 


0.0037 ±0.0013 


0.0153 ±0.0042 



Table 5. Column 2 displays estimates and 95% confidence intervals for the limiting knotting 
probabilities for Z~ , L~ , and S~ . Column 3 displays estimates and 95% confidence intervals 
for the limiting knotting probabilities for Z + , L+, and 5 + . Column 4 displays estimates and 
95% confidence intervals for the limiting knotting probabilities for Z, L, and S. 

We first explore the influence of the juxtaposition and the crossing-sign on the knotting 
probabilities for juxtapositions S~ , S + , Z~ , and Z + . Figure 8 displays (on a log scale) estimates 
of the relevant knotting probabilities (along with those for L~ and the unsigned S and Z) versus 
polygon length n\. It is clear from this figure that, for each value of m, the estimated knotting 
probabilities for S~ and S + and for Z~ and Z + , respectively, are statistically distinct, with the 
most dramatic difference being between the estimates for Z~ and Z + . Now, if crossing-sign is 
ignored, then the associated estimated knotting probabilities (those for the unsigned S and Z) in 
figure 8 are also statistically distinct for each displayed value of n\, but now the difference is less 
than that observed, for example, for S~ and Z~ . Hence the knotting probabilities associated with 
juxtapositions can strongly depend on the crossing-sign at the strand passage site. From conjecture 
4, we expect that the knotting probabilities associated with the juxtapositions (whether signed or 
unsigned) will go to a juxtaposition-dependent constant as n — > oo. Figure 8 provides evidence of 
this: for each juxtaposition, the point estimates appear to be approaching distinct limiting values. 
The estimated limiting knotting probabilities presented in table 5 also support this. 

We can say more about the influence of the local geometry at the strand passage site. Although 
S and Z both differ from L by one edge (cf. figure 2), for each value of m, the estimated 
knotting probabilities for S~ , Z~ , and L~ (as displayed in figure 8) are clearly statistically 
distinct. Moreover, the knotting probability for Z~ is approximately double that for L~, and 
the knotting probability for S~ is approximately one-fifth that for L~. From the estimates in 
table 5, the limiting knotting probabilities associated with these three signed juxtapositions are 
also statistically different. Clearly a small change in the local juxtaposition can have a significant 
impact on the associated probabilities of knotting. Further, if the crossing-sign dependence is 
ignored, then the associated estimates for the limiting knotting probabilities (those for Z, L, and 
S in table 5) are also statistically different. We thus conclude that the knotting probabilities arc 
impacted by the local juxtaposition, whether signed or unsigned. 

We thus have strong evidence that the crossing-sign and a very minor change in the local 
juxtaposition at the strand passage site influences the knotting probabilities, with the most 
dramatic influence occuring when the crossing-sign is not ignored. In fact, depending on the 
juxtaposition geometry G at the strand passage site, one can either preferentially knot an unknotted 
polygon (if G = Z~ , for example) or preferentially keep it unknotted (if G = Z + ). 

We now turn our attention to determining the influence of juxtaposition compactness on the 
limiting knotting probabilities. For m~-SAPS, m 6 M, we first study the dependence of the 

proportion of m -SAPS, — on m. The second column in table 6 displays 95% confidence 

Pn° 
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Figure 8. The grouped-ra estimates for the G-restricted knotting probabilities (on a log scale) 
for 0-SAPs containing G = S, Z, the four associated signed juxtapositions, and L~ . The error 
bars are estimated 95% confidence intervals, ni £ {14, 134, 1454}, and ni = ni + 118. 



m 


p m 


P m ~{4>) 


14 


0.0222 ±0.0002 


0.0013 ±0.0002 


16 


0.1695 ±0.0005 


0.0027 ±0.0010 


18 


0.3867 ±0.0023 


0.0068 ±0.0003 


20 


0.3165 ±0.0013 


0.0269 ±0.0005 


22 


0.1032 ±0.0010 


0.1165 ±0.0074 



Table 6. In column 2, estimates of the limiting proportion m _ -SAPs along with 95% margins 
of error. In column 3, estimates of the limiting knotting probability for ra~-SAPs along with 
95% margins of error. 



intervals for the limiting proportion of m -SAPS, p m = lim — These estimates increase 

Pn" 

from m = 14 to 18 and then decrease from m = 18 to 22. On the other hand, if we consider 
instead the number of (-) juxtapositions that have a given m~-size then: the number that has size 
mT = 14, 16, 18, 20, 22 is, respectively, 1, 10, 37, 60, 36. These numbers increase for m = 14 to 20 
and then decrease for m = 20 to 22, a slightly different trend than that observed for the proportion 
of polygons in each size class. Thus the numbers of m~-sizc juxtapositions for m = 14 to 22 do 
not determine the relative proportions of ra - -SAPs. 

Figure 9 displays our estimates for p™ ljri2 (<?!>)• As n\ goes to infinity, these estimates are 
expected to approach the limiting knotting probability for m~-SAPs, p m ((/>). Table 6 displays the 
estimates for these limiting knotting probabilities along with computed 95% confidence intervals. 
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Figure 9. The grouped-™ estimates for P™ i n2 (<fi), f° r (from bottom to top) m S 
{14,16,18,20,22} and n x e {14, 134, 254, 1454}. The error bars represent 95% confidence 
intervals. 



The juxtaposition (Z + -mirror) associated with size-14 - forms the tightest juxtaposition. Note 
that in figure 9, the grouped- n estimates for „ 2 (</>), for each m £ {14, 134, 254, 1454} are all 
very close to zero. Consequently the limiting knotting probability, p 14 (</>), will also be very close to 
zero; this is consistent with the observation in [12] regarding their tightest (hooked) juxtaposition, 
that when starting with an unknot, the after-virtual-strand-passage polygons are essentially always 
unknotted. They also comment that their after- virtual-strand-passage polygons are more knotted 
as the juxtaposition becomes less tight. Our results in table 6 are consistent with this trend since, 
statistically, they satisfy: 

p 14 ~ $) < P ie ~ {4>) < p ls ~ $) < p 2 "- < P 22 ~ {4>). (31) 

This increasing trend with mT cannot be attributed to any trend in the proportions of polygons 
in classes 14~, 16~, 18~, 20~, and 22~ because, as previously noted, there is no strictly increasing 
trend in these proportions. 

It should be pointed out, however, that the trend observed in equation (31) is an average 
property of the compactness classes. Distinct juxtapositions having the same m~-size can have 
quite different limiting knotting probabilities associated with them. Furthermore, there exist 
distinct juxtapositions, having different m~-sizes, such that the associated limiting knotting 
probabilities follow the reverse trend to that of equation (31). The observed trend of equation 
(31) is also dependent on the choice of compactness measure; there are other possible choices for 
compactness measure, such as the dimensions of the smallest box containing a juxtaposition, where 
this trend is not observed. 

The compactness results just presented were based on taking into account crossing-sign. If 
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this is ignored, then amongst the 144 possible juxtapositions, the numbers with compactness 
size m — 14, 16, 18, 20, 22 respectively, are 2, 20, 68, 50, and 4. Figure 10 displays the grouped 
probability estimates for p m (4>) plotted versus n\ > 134. The corresponding estimates for n\ = 14, 
which are an order of magnitude smaller than the others, are not plotted but given below: 
p\l 13i (4>) = 0.00502 ± 0.00013; p\% lzA {4>) = 0.00551 ± 0.00006; p\l. 134 {4>) = 0.00613 ± 0.00005; 
Pi4,i34(</0 = 0.00681 ± 0.00007; pH tl34 (<j>) = 0.00706 ± 0.00025. For this smallest choice of m, the 
point estimates (although very close to zero) do follow the general trend that the more compact the 
polygon is around the strand passage site, the lower the associated probability of knotting. For the 
estimates in figure 10, however, no such trend is present. The most we can say is that compactness 
(when juxtaposition sign is ignored) does impact the associated limiting knotting probabilities. 
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Figure 10. The grouped probability estimates of p™ 1T12 (0) for m £ {14(V), 16(A 
),18(o), 20(H), 22(B)} and m 6 {134, 254, 1454}. Only error bars (which represent 95% 
confidence intervals) for compactness sizcs-18, 20, and 22 are shown. The estimated error bars 
for compactness sizes 14 and 16 are at least double those of the associated compactness size-20 
error bar. 

Since crossing-sign plays a role and since the experimental results (see [21, Fig. 1]) indicate 
that topo IV can have a preference for changing a (+) to a (-) crossing, we focus on + -SAPs for 
investigating the influence of the opening angle on knotting probability. The results for 0+-SAPs 
are obtained from our CMC sample of - -SAPs by considering their mirror image via ~. Recall 
that the opening angle was defined so that a juxtaposition J + had the same angle as J + -mirror. 
We first investigate the knotting probability for B^-SAPs as a function of the opening angle for 
all polygons with lengths n = n\ = 134 to = 252 grouped together. The knotting probability 
for each J+ , J e J is plotted versus opening angle in figure 11 (a). The plot shows a positive 
correlation between opening angle and knotting probability. This same trend was observed (plots 
are not shown here) for other choices of m = 14 to m = 614 and this was tested statistically. 
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The correlation coefficients for n x = 14, 134, 254, 374, 494, 614 are, respectively, 0.718, 0.781, 0.794, 
0.794, 0.801, 0.804. For each case, we tested whether the true correlation associated with the data 
was zero versus the alternative that it was positive. The p-values for each test are less than 0.001, 
hence we conclude that each correlation is positive. Thus as the opening angle increases from 0° to 
180°, on average, the probability of knotting increases, or equivalently the more acute the opening 
angle, the less likely (on average) the juxtaposition will knot an unknot. 

To explore this further, figure 11 (b) presents the angle-dependent knotting probability 
obtained by grouping together the polygons whose juxtapositions have the same opening angle. 
The figure supports that the knotting probability increases as opening angle increases, and appears 
very close to linear on the log scale (over the interval [0,180]). To show that this trend continues as 
polygon length increases, figure 12 displays the angle-dependent grouped-n knotting probabilities 
for five angles (0°, 53.13°, 90°, 135°, 180°) versus n\. The trend continues through all lengths. We 
expect that the knotting probabilities studied here will go to a constant as nx — > oo and, for each 
angle, the point estimates appear to be approaching distinct angle-dependent limiting values. 

6. Summary and Discussion 

Topoisomerase enzymes are able to, through a local action (strand passage) on DNA, efficiently 
change the knot-type of a DNA molecule. Just how the enzyme determines the local position 
within the DNA and to what extent randomness is involved are open questions. Motivated by 
these questions, here we have presented a lattice polygon model for a local strand passage (cf. 
section 2), explored the asymptotic properties (as polygon length tends to infinity) associated with 
the model (cf. section 2.1), reviewed tools to simulate the model (cf. section 3) and to obtain 
statistical estimates (cf. sections 4.1-4.2), and then applied these tools to the simulated data to 
explore the asymptotic properties (cf. sections 4.1-5). 

We prove that the number of n-edge unknotted polygons that contain a fixed structure grows 
at the same exponential rate (/to) as the number of n-edge unknotted polygons. We review a 
new maximum likelihood technique to estimate exponential growth rates and critical exponents 
using data generated from a composite Markov chain. Using this technique, we obtain estimates 
for Ko which are consistent with other known estimates for kq. We provide numerical evidence 
to support the conjecture that the limiting probabilities associated with different after-strand- 
passage properties exist and lie strictly in (0,1). We also show that not only does the local 
geometry around the strand passage site influence the limiting probabilities of knotting, but 
combining this information with crossing-sign information has an even greater influence on the 
limiting probabilities of knotting. We further show, using two compactness measures that take 
the crossing-sign information into account, that as the local juxtaposition of a Oo-SAP becomes 
more and more compact, the limiting probability of knotting associated with the compactness class 
decreases. 

Others [40] have noted that the local geometry of the strand-passage site could play a role 
in the topoisomerase-DNA interaction. Our work suggests that the crossing-sign at the strand 
passage site is also an important factor to consider, and this is consistent with experimental results 
which indicate that some type II topoisomerases exhibit a chirality bias. Topoisomerase acts very 
locally on DNA, ic it acts on the DNA in the space occupied by the topoisomerase. Consequently 
it acts within a finite volume, a very small volume when compared to the volume of the DNA itself. 
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Figure 11. (a) The grouped probability estimates for the knotting probability for J+-SAPs 
whose lengths are between ni = 134 and 712 = 252 edges inclusive, plotted versus opening angle 
a (degrees). The error bars presented are estimated 95% confidence intervals for J+ £ {Z~, 
S~, <S+, Z+}; the error bars for all other juxtapositions are estimated to be smaller, (b) The 
grouped probabilities for the angle-dependent knotting probabilities for a+-SAPs whose lengths 
are between ri\ = 134 and ni = 252 edges inclusive, plotted versus opening angle. The error 
bars are estimated 95% confidence intervals, with the majority being too small to appear clearly 
on the graph. 



Our model indicates that if the topoisomerase can take into account the crossing sign information 
at the strand passage site, then it can make a change in a small volume and preferentially knot an 
unknot (for our Z~ juxtaposition, about 63% of the time a strand passage transforms the unknot 
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Figure 12. The groupcd-n estimates for p" lt „ 2 (4>): f° r (from bottom to top) a £ 
{180,135,90,53,0} and m e {14, 134, 254, 1454}. The error bars represent 95% confidence 
intervals. 



into a knot) or preferentially leave it unknotted (for Z + , only 0.13% of the time does a strand 
passage result in a knot). 
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Appendix A. The CMC MLE method 

For the CMC MLE method, we assume that a sequence t = 1,...,T} of M-tuples 

<2>W = of SAPs from a set .Y has been generated from a CMC Monte Carlo algorithm 

with equilibrium distribution ir{q,(3); for our case, the SAPs are 0~-SAPs and the distribution is 
given by 

n(q,/3):={7r u (q,z),< JJ e^ M }, (A.l) 

where f3 = fa, —, 0m) with /3, < — k for each i, z = (e^ 1 , e' 32 , e@ M ), and w u ,(q,z) is as in 
equation (23). Let s n denote the number of n-edge polygons in S". It is also assumed that for any 
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of the polygon subsets of interest, *-SAPs in our case with * e {0 , 6 , O (K); K G JC}, that the 
number of n-edge polygons in the subset, s* , satisfies for all n > A^ in , 

s* n = A4n + K) a *e nK0 . (A.2) 

Let s* = s n — s* ; analogous assumptions are made about the asymptotic behaviour of s* . Note 
that the "ft." in this general asymptotic form is added to accommodate for some of the effects 
of the unknown o(l) term in equation (11). Finally, we assume that the probability, denoted 
by QN^ in {fli), that a chain i polygon's length is less than A^ in under distribution 7r(g,/3) is an 
unknown parameter for each i = 1, M. With these model assumptions, the goal of the method 
is then to obtain maximum likelihood estimates for a* and Kq and the other unknown parameters 
using the CMC Monte Carlo data. To do this, given a specific subset or "property" *, a log- 
likelihood function needs to be defined. The relevant log-likelihood function is defined below. 

Note that the model's asymptotic form (from equation (A.2)) depends only on polygon lengths 
and the property *. Furthermore, the model applies only to sufficiently large polygon lengths. At 
the same time, very large polygons are rare events in a Markov chain generated from the 9-BFACF 
algorithm. Thus we concentrate on polygon observations in a restricted polygon-length interval 
where it is expected that the model applies and that there is sufficient data for reliable estimates. 
For this purpose, given a property * and fixed even positive integers N and N' (the boundaries 
of the polygon length interval of interest) such that 14 < N < N', we define the following four 
indicator functions for to 6 ,5^: 

f 1, if u has property * 

M") ■= S n . . ( A -3) 

I 0, otherwise 

and for any even positive integer n, define 

7i ^ , = | 1, if0<n< N (A4) 
^ 1 0, otherwise 

I _ | 1, if N < n < N' ^ A5 ^ 

^ y 0, otherwise 

and 

I (n) — ( 1 ' if U > N> (A 6) 

^ 1 0, otherwise 

Next, let \cu\ denote the length of a polygon lu G 5? . Our interest is in two functions, defined on 
& of these indicator functions: 

A» = I <2) (M)M + /<3>(M)(AT' + 1), (A.7) 

which, within the restricted interval, keeps track of the actual polygon lengths, while, outside 
the restricted interval, it just keeps track of whether the length is above or below the interval 
boundaries; and 

X,(w) = I <2) (M)V.(w), (A.8) 

which keeps track of whether a polygon in the restricted length interval has the given property or 
not. The possible values for the pair (X(u), X*{lo)) are given by (n, rj) £ S(N, N') := {(0, 0), (iV' + 
1,0)} U {{A, N + 1, N'} x {0, 1}}. Thus, given N and A', from the generated CMC Monte 
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Carlo polygon data we can obtain a sequence of M-tuples of ordered pairs {X{u\ '),X^{Cb\ ')) = 
(nfUf)) G S{N,N'), for i = 1,...,M, * - 1,...,T. 

For TV = -/V+ n and iV' = -/V^ ax , consider any (tot, <^t) that is a sequence of T M-tuples of pairs 
(nf^f 1 ) G S(N,N'), i = 1, M, t = 1, ...,T. The model log-likelihood, ^, for this sequence as 
an outcome from the CMC Monte Carlo algorithm can be written in terms of the (M + 6) unknown 
parameters (k , a*, ft*, a*, ft*, A = ^ and Q(ft) = 1 - Qiv(ft)), for z G {1,2, ...,M}, as follows 
[7]: 

M 

£' T := T' [( / <2> log w(ni)) r + («o + A) (/( 2 > (»i)ni) r ] 

M 

+ T"^[a* Iog(ni + ft*)) T ] 

i=l 
M 

+ T' ^ Q*( [/< 2 > (ni) - <5 4 ] log(n 4 + ft-)) T 

i=l 
M 

+ T'^(^) T log^ + (7 (1> (n i )) T log 

i=l 
M 

+ T'^(7 (3) (n i )) r log 

i=l 
M 

+ ^E( / (2,3)(^)) T 
i=l 

where 7 <2 ,3> = ^(2) + 7< 3 ) , 



QUW~ E w{n){n + h.) a 'e {K0+fi)n , (A.10) 

0<2,3)(i8):= E Mi)(i + ft.re (K0+W , (A.11) 

7>iV* . 

J — min 

• G {*,*}, w(n) = (n — 6)n q and, for any function g defined on S(N^ in , A^ ax ), 

(^,^= ^^7 ' ' } - (A-12) 



Note that the sample averages in t' T are based on all T sample data points and are given by equation 
(A. 12). Also note that t' T is only a function of the (M + 6) parameters: k , a*, ft*, a*, ft*, A and 
Q(ft), for i G {1, 2, M}. The factor T' in front accomodates for the fact that the generated data 
is not necessarily independent. Following the Berretti-Sokal Method [39], to compensate for the 
lack of independence in the sample data, the log-likelihood function obtained under the assumption 
of independence can be rescaled according to the number of "essentially independent" data points, 
T', by multiplying it by T'/T. The result in this case, is the log-likelihood function defined above. 

To obtain the MLEs for the parameters kq, a*, ft*, a-, ft-, A and <3(ft), for i G {1,2,..., M}, 
in the log-likelihood l' T : £' T is differentiated with respect to each of these M + 6 parameters; each 
of the resulting partial derivatives is set to zero; and finally the resulting system of equations is 
solved simultaneously to obtain the MLEs. For 1 < i < M, setting qq^.^ = and then solving 
for Q (ft) yields the following MLEs for Q (ft) : 

Q (ft) = (7(2,3) (nO) T , for i G {1, 2, M}. (A.13) 



-<3(ft) ) 



Q* {3) (ft) + AQ* {3) (ft) 



logQ (ft) ~ log 0( 2i3 ) (ft) + AQ* {2>!1) (ft) 



(A.9) 
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In order to obtain MLEs for the remaining six parameters (k , a*, h*, a-, h-, and A), the MLEs 
from equation (A. 13) are substituted into the system of equations to yield a new system of six 
equations. The new system is solved numerically for the MLEs using the Newton-Raphson Method. 

In practice, the simulated data is used to select an appropriate choice for the boundaries, 
N and N', of the restricted polygon length interval. For a given property *, we first estimate, 
based on the statistical "reliability" of the generated polygon length data, a value for N' = iV^ax- 
Then, a choice N = N^ in is obtained as follows. Given any TV, let ko(N) and d*(iV) be CMC 
MLE estimates for kq and a*, respectively. Then N^ in is estimated to be the first value of N in 
{14, 16, 18, ...} for which for all m such that N < m < iV^ ax , \k {m) - k (m + 2)| < 0.000001 
and |d*(m) — d*(m + 2)| < 0.0001. In other words, we choose AP^ in to be the value for which the 
estimates k (N) and d*(iV) are essentially constant for all N > N^ in . This is akin to the so-called 
"flatness" region discussed in [39]. Full details of the methods for choosing -/V^ ax and N^ in are 
given in [7]. 
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